The Binomial Distribution

One of the simplest and most common examples of a random phenomenon is a coin flip: an event that is either “yes” or “no” with some probability. Here you’ll learn about the binomial distribution, which describes the behavior of a combination of yes/no trials and how to predict and simulate its behavior.

Simulating Coin Flips

In these exercises, you’ll practice using the rbinom() function, which generates random “flips” that are either 1 (“heads”) or 0 (“tails”).

Instructions

*With one line of code, simulate 10 coin flips, each with a 30% chance of coming up 1 (“heads”).

Solution

# Generate 10 separate random flips with probability .3
rbinom(10,1,.3)
FALSE  [1] 1 0 0 0 1 1 0 0 0 0

Simulating Draws from a Binomial

In the last exercise, you simulated 10 separate coin flips, each with a 30% chance of heads. Thus, with rbinom(10, 1, .3) you ended up with 10 outcomes that were either 0 (“tails”) or 1 (“heads”).

But by changing the second argument of rbinom() (currently 1), you can flip multiple coins within each draw. Thus, each outcome will end up being a number between 0 and 10, showing the number of flips that were heads in that trial.

Instructions

Use the rbinom() function to simulate 100 separate occurrences of flipping 10 coins, where each coin has a 30% chance of coming up heads.

Solution

# Generate 100 occurrences of flipping 10 coins, each with 30% probability
rbinom(100,10,.3)
FALSE   [1] 2 2 0 4 4 5 3 2 3 2 2 3 4 4 3 4 6 1 2 1 1 3 3 2 4 3 2 3 3 2 3 4 2 2 4
FALSE  [36] 0 2 5 2 2 6 3 1 3 1 2 1 2 3 2 3 2 1 2 5 3 4 3 2 3 3 6 3 2 1 3 1 5 2 2
FALSE  [71] 4 5 0 3 1 3 3 4 0 3 3 4 3 4 5 5 4 1 1 4 1 4 2 4 3 4 3 2 1 2

Calculating Density of a Binomial

If you flip 10 coins each with a 30% probability of coming up heads, what is the probability exactly 2 of them are heads?

Instructions

Answer the above question using the dbinom() function. This function takes almost the same arguments as rbinom(). The second and third arguments are size and prob, but now the first argument is x instead of n. Use x to specify where you want to evaluate the binomial density.

Confirm your answer using the rbinom() function by creating a simulation of 10,000 trials. Put this all on one line by wrapping the mean() function around the rbinom() function.

Solution

# Calculate the probability that 2 are heads using dbinom
dbinom(2,10,.3)
FALSE [1] 0.2334744
# Confirm your answer with a simulation using rbinom
mean(rbinom(10000,10,.3)==2)
FALSE [1] 0.2391

Calculating Cumulative Density of a Binomial

If you flip ten coins that each have a 30% probability of heads, what is the probability at least five are heads?

Instructions

Answer the above question using the pbinom() function. (Note that you can compute the probability that the number of heads is less than or equal to 4, then take 1 - that probability).

Confirm your answer with a simulation of 10,000 trials by finding the number of trials that result in 5 or more heads.

Solution

# Calculate the probability that at least five coins are heads
1-pbinom(4,10,.3)
FALSE [1] 0.1502683
# Confirm your answer with a simulation of 10,000 trials
mean(rbinom(10000,10,.3)>=5)
FALSE [1] 0.1497

Varying the Number of Trials

In the last exercise you tried flipping ten coins with a 30% probability of heads to find the probability *at least five are heads. You found that the exact answer was ‘1 - pbinom(4, 10, .3)’ = 0.1502683, then confirmed with 10,000 simulated trials.

Did you need all 10,000 trials to get an accurate answer? Would your answer have been more accurate with more trials?

Instructions

Try answering this question with simulations of 100, 1,000, 10,000, 100,000 trials, so you can see which is the closest to the exact answer.

Solution

# Here is how you computed the answer in the last problem
mean(rbinom(10000, 10, .3) >= 5)
FALSE [1] 0.1437
# Try now with 100, 1000, 10,000, and 100,000 trials
mean(rbinom(100, 10, .3) >= 5)
FALSE [1] 0.15
mean(rbinom(1000, 10, .3) >= 5)
FALSE [1] 0.156
mean(rbinom(10000, 10, .3) >= 5)
FALSE [1] 0.1501
mean(rbinom(100000, 10, .3) >= 5)
FALSE [1] 0.14969

Calculating the Expected Value

What is the expected value of a binomial distribution where 25 coins are flipped, each having a 30% chance of heads?

Instructions

Calculate this using the exact formula you learned in the lecture: the expected value of the binomial is size * p. Print this result to the screen.

Confirm with a simulation of 10,000 draws from the binomial.

Solution

# Calculate the expected value using the exact formula
25*.3
FALSE [1] 7.5
# Confirm with a simulation using rbinom
mean(rbinom(10000,25,.3))
FALSE [1] 7.5268

Calculating the Variance

What is the variance of a binomial distribution where 25 coins are flipped, each having a 30% chance of heads?

Instructions

Calculate this using the exact formula you learned in the lecture: the variance of the binomial is size * p * (1 - p). Print this result to the screen.

Confirm with a simulation of 10,000 trials.

Solution

# Calculate the variance using the exact formula
25*.3*(1-.3)
FALSE [1] 5.25
# Confirm with a simulation using rbinom
var(rbinom(10000,25,.3))
FALSE [1] 5.278581

Laws of Probability

In this chapter you’ll learn to combine multiple probabilities, such as the probability two events both happen or that at least one happens, and confirm each with random simulations. You’ll also learn some of the properties of adding and multiplying random variables.

Solving for Probability of A and B

If events A and B are independent, and A has a 40% chance of happening, and event B has a 20% chance of happening, what is the probability they will both happen?

Hint: To find the probability independent events A and B both happen, multiply their probabilities.

Simulating the Probability of A and B

You can also use simulation to estimate the probability of two events both happening.

Instructions

Randomly simulate 100,000 flips of coin A, each of which has a 40% chance of being heads. Save this as a variable A.

Randomly simulate 100,000 flips of coin B, each of which has a 20% chance of being heads. Save this as a variable B.

Use the “and” operator (&) to combine the variables A and B to estimate the probability that both A and B are heads.

Solution

# Simulate 100,000 flips of a coin with a 40% chance of heads
A <- rbinom(100000, 1, .4)

# Simulate 100,000 flips of a coin with a 20% chance of heads
B <- rbinom(100000, 1, .2)

# Estimate the probability both A and B are heads
mean(A & B)
FALSE [1] 0.07967

Simulating the Probability of A, B, and C

Randomly simulate 100,000 flips of A (40% chance), B (20% chance), and C (70% chance). What fraction of the time do all three coins come up heads?

Instructions

You’ve already simulated A and B. Now simulate 100,000 flips of coin C, where each has a 70% chance of coming up heads.

Use A, B, and C to estimate the probability that all three coins would come up heads.

Solution

# You've already simulated 100,000 flips of coins A and B
A <- rbinom(100000, 1, .4)
B <- rbinom(100000, 1, .2)

# Simulate 100,000 flips of coin C (70% chance of heads)
C <- rbinom(100000, 1, .7)

# Estimate the probability A, B, and C are all heads
mean(A&B&C==1)
FALSE [1] 0.05593

Solving for the Probability of A or B

If coins A and B are independent, and A has a 60% chance of coming up heads, and event B has a 10% chance of coming up heads, what is the probability either A or B will come up heads?

Hint: The probability of A or B happening (when A and B are independent, as they are here) is P(A) + P(B) - P(A) * P(B).

Simulating Probability of A or B

In the last exercise, you found that there was a 64% chance that either coin A (60% chance) or coin B (10% chance) would come up heads. Now you’ll confirm that answer using simulation.

Instructions

Use rbinom() to simulate 100,000 flips of coin A, each having a 60% chance of being heads.

Use rbinom() to simulate 100,000 flips of coin B, each having a 10% chance of being heads.

Use these to estimate the probability that A or B is heads.

Solution

# Simulate 100,000 flips of a coin with a 60% chance of heads
A <- rbinom(100000,1,.6)

# Simulate 100,000 flips of a coin with a 10% chance of heads
B <- rbinom(100000,1,.1)

# Estimate the probability either A or B is heads
mean(A|B==1)
FALSE [1] 0.63986

Probability Either Variable is Less Than or Equal to 4

Suppose X is a random Binom(10, .6) variable (10 flips of a coin with 60% chance of heads) and Y is a random Binom(10, .7) variable (10 flips of a coin with a 70% chance of heads), and they are independent.

What is the probability that either of the variables is less than or equal to 4?

Instructions

Simulate 100,000 draws from each of X (10 coins, 60% chance of heads) and Y (10 coins, 70% chance of heads) binomial variables, saving them as X and Y respectively.

Use these simulations to estimate the probability that either X or Y is less than or equal to 4.

Use the pbinom() function to calculate the exact probability that X is less than or equal to 4, then the probability that Y is less than or equal to 4.

Combine these two exact probabilities to calculate the exact probability that either variable is less than or equal to 4.

Solution

# Use rbinom to simulate 100,000 draws from each of X and Y
X <-rbinom(100000,10,.6) 
Y <- rbinom(100000,10,.7)

# Estimate the probability either X or Y is <= to 4
mean(X<=4|Y<=4)
FALSE [1] 0.20478
# Use pbinom to calculate the probabilities separately
prob_X_less <- pbinom(4,10,.6)
prob_Y_less <- pbinom(4, 10, .7)

# Combine these to calculate the exact probability either <= 4
prob_X_less+prob_Y_less-(prob_Y_less*prob_X_less)
FALSE [1] 0.2057164

Expected Value of Multiplying a Random Variable

If X is a binomial with size 50 and p = .4, what is the expected value of 3*X?

Hint: The expected value of a binomial is size * p, and the expected value of k * X is k * E[X].

Simulating Multiplying a Random Variable

In this exercise you’ll use simulation to confirm the rule you just learned about how multiplying a random variable by a constant effects its expected value.

Instructions

Simulate 100,000 draws of X, a binomial random variable with size 20 and p = .1. Save this as X

Use this simulation to estimate the expected value of X.

Use this simulation to estimate the expected value of 5*X, as well.

Solution

# Simulate 100,000 draws of a binomial with size 20 and p = .1
X <- rbinom(100000,20,.1)

# Estimate the expected value of X
mean(X)
FALSE [1] 2.00379
# Estimate the expected value of 5 * X
mean(5*X)
FALSE [1] 10.01895

Variance of a Multiplied Random Variable

In the last exercise you simulated X from a binomial with size 20 and p = .1 and now you’ll use this same simulation to explore the variance.

Instructions

Use this simulation to estimate the variance of X.

Estimate the variance of 5 * X

Solution

# X is simulated from 100,000 draws of a binomial with size 20 and p = .1
X <- rbinom(100000, 20, .1)

# Estimate the variance of X
var(X)
FALSE [1] 1.797931
# Estimate the variance of 5 * X
var(5*X)
FALSE [1] 44.94829

Solving for the Sum of Two Binomial Variables

If X is drawn from a binomial with size 20 and p = .3, and Y from size 40 and p = .1, what is the expected value (mean) of X + Y?

Hint: Compute the expected value of X and the expected value of Y separately, then add them together.

Simulating Adding Two Binomial Variables

In the last exercise, you found the expected value of the sum of two binomials. In this problem you’ll use a simulation to confirm your answer.

Instructions

Simulate 100,000 draws from X, a binomial with size 20 and p = .3, and Y, with size 40 and p = .1.

Use this simulation to estimate the expected value of X + Y.

Solution

# Simulate 100,000 draws of X (size 20, p = .3) and Y (size 40, p = .1)
X <-rbinom(100000,20,.3)
Y <-rbinom(100000,40,.1)

# Estimate the expected value of X + Y
mean(X+Y)
FALSE [1] 9.99454

Simulating Variance of Sum of Two Binomial Variables

In the last multiple choice exercise, you examined the expected value of the sum of two binomials. Here you’ll estimate the variance.

Instructions

Use your simulation of the variables X and Y to estimate the variance of X + Y. Use your simulation to estimate the variance of 3 * X + Y.

Solution

# Simulation from last exercise of 100,000 draws from X and Y
X <- rbinom(100000, 20, .3) 
Y <- rbinom(100000, 40, .1)

# Find the variance of X + Y
var(X+Y)
FALSE [1] 7.816989
# Find the variance of 3 * X + Y
var(3*X+Y)
FALSE [1] 41.64792

Bayesian Statistics

Updating

Suppose you have a coin that is equally likely to be fair (50% heads) or biased (75% heads). You then flip the coin 20 times and see 11 heads.

Without doing any math, which do you now think is more likely- that the coin is fair, or that the coin is biased?

Updating with Simulation

We see 11 out of 20 flips from a coin that is either fair (50% chance of heads) or biased (75% chance of heads). How likely is it that the coin is fair? Answer this by simulating 50,000 fair coins and 50,000 biased coins.

Instructions

Simulate 50,000 cases of flipping 20 coins from a fair coin (50% chance of heads), as well as from a biased coin (75% chance of heads). Save these variables as fair and biased respectively.

Find the number of fair coins where exactly 11/20 came up heads, then the number of biased coins where exactly 11/20 came up heads. Save them as fair_11 and biased_11 respectively.

Find the fraction of all coins that came up heads 11 times that were fair coins- this is the posterior probability that a coin with 11/20 is fair.

Solution

# Simulate 50000 cases of flipping 20 coins from fair and from biased
fair <-rbinom(50000,20,.5) 
biased <- rbinom(50000,20,.75)

# How many fair cases, and how many biased, led to exactly 11 heads?
fair_11 <- sum(fair==11)
biased_11 <- sum(biased==11)

# Find the fraction of fair coins that are 11 out of all coins that were 11
fair_11/(fair_11+biased_11)
FALSE [1] 0.8601563

Updating After 16 Heads

Suppose that when you flip a different coin (that could either be fair or biased) 20 times, you see 16 heads.

Without doing any math, which do you now think is more likely- that this coin is fair, or that it’s biased?

Updating with Simulation After 16 Heads

We see 16 out of 20 flips from a coin that is either fair (50% chance of heads) or biased (75% chance of heads). How likely is it that the coin is fair?

Instructions

Simulate 50,000 cases of flipping 20 coins from a fair coin (50% chance of heads), as well as from a biased coin (75% chance of heads). Save these variables as fair and biased respectively.

Find the number of fair coins where exactly 16/20 came up heads, then the number of biased coins where exactly 16/20 came up heads. Save them as fair_16 and biased_16 respectively.

Print the fraction of all coins that came up heads 16 times that were fair coins- this is the posterior probability that a coin with 16/20 is fair.

Solution

# Simulate 50000 cases of flipping 20 coins from fair and from biased
fair <- rbinom(50000,20,.5)
biased <- rbinom(50000,20,.75)

# How many fair cases, and how many biased, led to exactly 16 heads?
fair_16 <- sum(fair==16)
biased_16 <- sum(biased==16)

# Find the fraction of fair coins that are 16 out of all coins that were 16
fair_16/(fair_16+biased_16)
FALSE [1] 0.02384546

Updating with Priors

We see 14 out of 20 flips are heads, and start with a 80% chance the coin is fair and a 20% chance it is biased to 75%.

You’ll solve this case with simulation, by starting with a “bucket” of 10,000 coins, where 8,000 are fair and 2,000 are biased, and flipping each of them 20 times.

Instructions

Simulate 8,000 trials of flipping a fair coin 20 times and 2,000 trials of flipping a biased coin 20 times. Save them as fair_flips and biased_flips, respectively.

Find the number of cases that resulted in 14 heads from each coin, saving them as fair_14 and biased_14 respectively.

Find the fraction of all coins that resulted in 14 heads that were fair: this is an estimate of the posterior probability that the coin is fair.

Solution

# Simulate 8000 cases of flipping a fair coin, and 2000 of a biased coin
fair_flips <-rbinom(8000,20,.5)
biased_flips <-rbinom(2000,20,.75)

# Find the number of cases from each coin that resulted in 14/20
fair_14 <-sum(fair_flips==14)
biased_14 <-sum(biased_flips==14)

# Use these to estimate the posterior probability
fair_14/(fair_14+biased_14)
FALSE [1] 0.4791667

Updating with Three Coins

Suppose instead of a coin being either fair or biased, there are three possibilities: that the coin is fair (50% heads), low (25% heads), and high (75% heads). There is a 80% chance it is fair, a 10% chance it is biased low, and a 10% chance it is biased high.

You see 14/20 flips are heads. What is the probability that the coin is fair?

Instructions

Use the rbinom() function to simulate 80,000 draws from the fair coin, 10,000 draws from the high coin, and 10,000 draws from the low coin, with each draw containing 20 flips. Save them as flips_fair, flips_high, and flips_low, respectively.

For each of these types, compute the number of coins that resulted in 14. Save them as fair_14, high_14, and low_14, respectively.

Find the posterior probability that the coin was fair, by dividing the number of fair coins resulting in 14 from the total number of coins resulting in 14.

Solution

# Simulate 80,000 draws from fair coin, 10,000 from each of high and low coins
flips_fair <-rbinom(80000,20,.5) 
flips_high <- rbinom(10000,20,.75)
flips_low <- rbinom(10000,20,.25)

# Compute the number of coins that resulted in 14 heads from each of these piles
fair_14 <- sum(flips_fair==14)
high_14 <- sum(flips_high==14)
low_14 <- sum(flips_low==14)

# Compute the posterior probability that the coin was fair
fair_14/(fair_14+high_14+low_14)
FALSE [1] 0.6359348

Updating with Bayes Theorem

In this chapter, you used simulation to estimate the posterior probability that a coin that resulted in 11 heads out of 20 is fair. Now you’ll calculate it again, this time using the exact probabilities from dbinom(). There is a 50% chance the coin is fair and a 50% chance the coin is biased.

Instructions

Use the dbinom() function to calculate the exact probability of getting 11 heads out of 20 flips with a fair coin (50% chance of heads) and with a biased coin (75% chance of heads). Save them as probability_fair and probability_biased, respectively.

Use these to calculate the posterior probability that the coin is fair. This is the probability that you would get 11 from a fair coin, divided by the sum of the two probabilities.

Solution

# Use dbinom to calculate the probability of 11/20 heads with fair or biased coin
probability_fair <-dbinom(11,20,.5)
probability_biased <-dbinom(11,20,.75)

# Calculate the posterior probability that the coin is fair
probability_fair/(probability_fair+probability_biased)
FALSE [1] 0.8554755

Updating for Other Outcomes

In the last exercise, you solved for the probability that the coin is fair if it results in 11 heads out of 20 flips, assuming that beforehand there was an equal chance of it being a fair coin or a biased coin. Recall that the code looked something like:

probability_fair <- dbinom(11, 20, .5)
probability_biased <- dbinom(11, 20, .75)
probability_fair / (probability_fair + probability_biased)

Now you’ll find, using the dbinom() approach, the posterior probability if there were two other outcomes.

Instructions

Find the probability that a coin resulting in 14 heads out of 20 flips is fair.

Find the probability that a coin resulting in 18 heads out of 20 flips is fair.

Solution

# Find the probability that a coin resulting in 14/20 is fair
dbinom(14,20,.5)/(dbinom(14,20,.75)+dbinom(14,20,.5))
FALSE [1] 0.179811
# Find the probability that a coin resulting in 18/20 is fair
dbinom(18,20,.5)/(dbinom(18,20,.75)+dbinom(18,20,.5))
FALSE [1] 0.002699252

More Updating with Priors

Suppose we see 16 heads out of 20 flips, which would normally be strong evidence that the coin is biased. However, suppose we had set a prior probability of a 99% chance that the coin is fair (50% chance of heads), and only a 1% chance that the coin is biased (75% chance of heads).

You’ll solve this exercise by finding the exact answer with dbinom() and Bayes’ theorem. Recall that Bayes’ theorem looks like:

Pr(fair|A)=Pr(A|fair)Pr(fair)Pr(A|fair)Pr(fair)+Pr(A|biased)Pr(biased)

Instructions

Use dbinom() to calculate the probabilities that a fair coin and a biased coin would result in 16 heads out of 20 flips.

Use Bayes’ theorem to find the posterior probability that the coin is fair, given that there is a 99% prior probability that the coin is fair.

Solution

# Use dbinom to find the probability of 16/20 from a fair or biased coin
probability_16_fair <-dbinom(16,20,.5)
probability_16_biased <-dbinom(16,20,.75)

# Use Bayes' theorem to find the posterior probability that the coin is fair
(.99*probability_16_fair)/(.99*probability_16_fair+.01*probability_16_biased)
FALSE [1] 0.7068775